Consolidate Asset/Dandiset models; gate publish validation on datePublished#419
Draft
candleindark wants to merge 1 commit into
Draft
Consolidate Asset/Dandiset models; gate publish validation on datePublished#419candleindark wants to merge 1 commit into
candleindark wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #419 +/- ##
==========================================
+ Coverage 48.31% 48.45% +0.13%
==========================================
Files 19 19
Lines 2434 2456 +22
==========================================
+ Hits 1176 1190 +14
- Misses 1258 1266 +8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
e632eee to
a15e2e9
Compare
…Published
Collapse the publication-specific model variants into their base classes so
that each class's schemaKey value matches its class name, which is what an
eventual LinkML translation needs to use schemaKey as a type designator
(designates_type). The three classes whose schemaKey differed from their class
name (BareAsset -> "Asset", PublishedAsset -> "Asset", PublishedDandiset ->
"Dandiset") were the blockers.
Changes (all in dandischema/models.py):
- Merge PublishedDandiset into Dandiset and PublishedAsset into Asset. The
publication-only fields (doi, publishedBy, datePublished, releaseNotes on
Dandiset; publishedBy, datePublished on Asset) become optional, and the
publication requirements move into a datePublished-gated
`check_publication_status` model validator on each class:
- when datePublished is None (a draft), the publication-only fields must be
absent;
- when datePublished is set (published), enforce the former Published*
requirements (publishedBy/url/doi presence, the stricter id/url patterns,
check_filesbytes, digest_sha256check). All violations are reported
together in one error.
dandi-archive's publish flow injects datePublished before validating, so the
gated checks fire exactly as the Published* classes did before.
- Keep BareAsset as a distinct class (Asset still inherits from it) but align
its schemaKey to Literal["BareAsset"], so both BareAsset and Asset are
schemaKey-aligned. The client (dandi-cli) is responsible for setting
schemaKey to "Asset" when uploading bare metadata as an Asset.
- Remove the Publishable mixin; add PublishedDandiset/PublishedAsset as
deprecated aliases of Dandiset/Asset for backward compatibility (dandi-cli,
dandi-archive). These will be removed in a follow-up once consumers migrate.
- Simplify DandiBaseModel.ensure_schemakey to a plain schemaKey == class-name
check now that no class intentionally diverges.
to_datacite now asserts the published precondition (datePublished set) that the
PublishedDandiset type used to guarantee. Tests updated for the merged models:
the published variants report the same missing fields as their base classes,
and the publication requirements are exercised on complete, datePublished
instances; new tests cover the publication-coherence invariant.
Generated JSON Schema diff: the draft schemas (Dandiset, Asset) gain only
optional, readOnly publication properties (nothing newly required, no pattern
tightened), so the dandi-archive Meditor is unaffected; the published-*.json
schemas become the relaxed versions, with publication strictness now enforced
by the gated Pydantic validator.
Co-Authored-By: Claude Code 2.1.161 / Claude Opus 4.8 claude-opus-4-8 <noreply@anthropic.com>
a15e2e9 to
1298b26
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Make every model class's
schemaKeyvalue equal its class name, so that an eventual LinkML translation can useschemaKeyas its type designator (designates_type). Three classes violated this (BareAsset→"Asset",PublishedAsset→"Asset",PublishedDandiset→"Dandiset"), and fixing them takes two distinct changes:PublishedDandisetmerges intoDandisetandPublishedAssetintoAsset. Publication requirements now fire via adatePublished-gated validator instead of separate classes.BareAsset.schemaKeyto"BareAsset"Stored
schemaKeyvalues in archive data are unchanged, and the old names remain as deprecated aliases sodandi-cli/dandi-archiveimports keep working. Adopting this release does, however, require one lockstepdandi-clichange (see Follow-ups).What changed (all in
dandischema/models.py)Merge
PublishedDandiset→Dandiset,PublishedAsset→Asset. Publication-only fields (doi,publishedBy,datePublished,releaseNotesonDandiset;publishedBy,datePublishedonAsset) become optional, and the publication requirements move into adatePublished-gatedcheck_publication_statusmodel validator:datePublishedisNone(draft) → publication-only fields must be absent;datePublishedset (published) → enforce the formerPublished*rules (publishedBy/url/doipresence, stricterid/urlpatterns,check_filesbytes,digest_sha256check), reporting all violations in one error.dandi-archive injects
datePublishedbefore validating, so the gated checks fire exactly as thePublished*classes did.Keep
BareAsset(Assetstill inherits it); align itsschemaKeytoLiteral["BareAsset", "Asset"](soAssetnarrows to"Asset", theContributor/Activityidiom).Remove the
Publishablemixin; addPublishedDandiset/PublishedAssetas deprecated aliases.Simplify
ensure_schemakeyto a plainschemaKey == class-namecheck.to_dataciteasserts the published precondition (datePublishedset) thePublishedDandisettype used to guarantee.Why gate on
datePublishedrather than an explicit (context-triggered) validatorThe publish-only checks could instead be triggered explicitly by the caller (e.g. a Pydantic validation-context flag passed by the publish flow) rather than by the presence of
datePublished. We chose data-driven gating because:gen-json-schemaemits it asif/then). A caller-supplied validation mode has no LinkML equivalent — LinkML describes only the data instance, not how validation is invoked — so it would have to live as hand-written Python forever, outside the source of truth, working against the migration this change is meant to enable.validate()and direct construction (Dandiset(**data)/model_validate). A context flag only ridesmodel_validate(..., context=...): the tersePublishedDandiset(**meta)form (used byto_dataciteand in tests) cannot pass one, so it would silently skip the publish checks. Beyond the call-site rewrites that would force, a constructor whose enforcement quietly depends on how it is invoked is a lasting programming hazard.datePublishedpresent ⟺ published is a real invariant: a draft never carries it, and the archive injects it only when building a publishable version. "Published" is a state of the record, not a mode a caller opts into.obj.datePublished is not None; with a context flag, whether an object was validated as published is ephemeral to the call and recorded nowhere on the object.Compatibility
Dandiset/Assetgain only optional,readOnlypublication properties — nothing newly required, no pattern tightened — so the dandi-archive Meditor is unaffected (it hidesreadOnlyprops). Thepublished-*.jsonschemas become the relaxed versions, with publication strictness enforced by the gated validator.PublishedDandiset/PublishedAsset/BareAssetimports continue to work indandi-cli/dandi-archive.Follow-ups (separate PRs)
schemaKey = "Asset"on bare metadata at upload, sinceBareAsset.schemaKeyis now"BareAsset"and the server does not normalize it.views/schema.pymodel mapping).Test plan
tox -e py3(269 passed, 17 skipped — skips are environment-only: noDOI_PREFIX/ instance name not "DANDI")tox -e lint,typingdatePublished-gated publish requirements and the coherence invariant